Since
2^47 = 140,737,488,355,328 and 26^10 = 141,167,095,653,376
ten letters can represent 47 bits with fairly good efficiency. In fact, the information content of 10 letters from a 26-character alphabet amounts to 47.00439718 bits, less than 1/227th of a bit extra.
However, one would like to avoid performing 48-bit multiplications to do the conversion.
A method invented for IBM by Tien Chi Chen and Irving Tze Ho for representing three decimal digits in 10 binary bits, which has since become known as Chen-Ho encoding, can illustrate a way in which to do this.
Let ooo (or ppp, qqq) represent a digit from 0 to 7 represented by the following code:
000 0 001 1 010 2 011 3 100 4 101 5 110 6 111 7
and let d (or e, f) represent a digit from 8 to 9 in the following code:
0 8 1 9
then, any three decimal digits can be represented within 10 bits as follows:
0ooopppqqq (3 digits with 3 bits - 1 way) 100dpppqqq (2 digits with 3 bits, 1 digit with 1 bit - 3 ways) 101oooeqqq 110ooopppf 11100deqqq (1 digit with 3 bits, 2 digits with 1 bit - 3 ways) 11101dpppf 11110oooef 1111100def (3 digits with 1 bit - 1 way)
with 24 unused combinations remaining.
This method of encoding was patented by IBM, in U.S. patent 3,842,414. This patent was filed on June 18, 1973, and granted on October 15, 1974. Also, it was described in a paper in the January 1975 issue of Communications of the Association for Computing Machinery. (I thought I first encountered this method in a paper in the IBM Systems Journal, but my memory must be playing tricks on me.)
A further discussion of the encoding of decimal digits using this technique which was formerly found on this page has been moved here, where it seemed to fit better with the topic of the page.
It may or may not already be obvious, but the table illustrating the decimal to binary code is based on ooo, ppp, and qqq representing the first, second, and third digits respectively (if they are from 0 to 7) and on d, e, and f representing the first, second, and third digits respectively if they are instead either 8 or 9. This same notational convention will be used within the description of the coding that takes 47 bits to ten letters.
Here, we have some binary values left over, so we can go from decimal to binary, but we cannot use this code to represent arbitrary binary data as decimal digits without some modification.
With the lengths that I am choosing, it will be the other way around for letters; we will use all possible binary combinations, and still have a few alphabetic combinations left over.
Because the conversion from 47 bits to 10 letters is very efficient, a direct coding, since it would use a large number of possible codes, would be very complicated. Instead, to derive a coding the description of which would be reasonable in length, I have chosen to describe a method that has a two-layer structure.
First, we will represent 14 binary bits as three letters.
aaaa (or bbbb, cccc) will represent a letter from the code:
0000 A 0001 B 0010 C 0011 D 0100 E 0101 F 0110 G 0111 H 1000 I 1001 J 1010 K 1011 L 1100 M 1101 N 1110 O 1111 P
iii (or jjj, kkk) will represent a letter from the code:
000 Q 001 R 010 S 011 T 100 U 101 V 110 W 111 X
and m (or n, o) will represent a letter from the code:
0 Y 1 Z
Then, a coding generating three letters from 14 bits looks like the following:
00aaaabbbbcccc (4,4,4: 1 way) 010iiibbbbcccc (3,4,4: 3 ways) 011aaaajjjcccc 100aaaabbbbkkk 1010iiijjjcccc (3,3,4: 3 ways) 1011iiibbbbkkk 1100aaaajjjkkk 11010iiijjjkkk (3,3,3: 1 way) 11011mbbbbcccc (1,4,4: 3 ways) 11100aaaancccc 11101aaaabbbbo 111100mjjjcccc (1,3,4: 6 ways: 4 used here.) 111101mbbbbkkk 111110iiincccc 111111aaaankkk
This shorter coding of 14 bits as three letters is not quite as efficient as the overall coding of 47 bits to ten letters we will construct using it as a component. 3 times 14 is 42, and so there are five bits left, which don't quite fit in one letter. If the first five bits represent a number from 0 to 25, then they can represent a letter, using the codings shown above, in one of the forms 0aaaa, 10iii, or 1100m. Otherwise, codings which use the combinations of three letters which are not used by the above coding of fourteen bits to three letters are used. These codings don't encode fourteen bits, since only a small fraction of the possible combinations of three letters are left.
Some more codings, using the unused letter combinations, are now constructed for use in producing a coding of 47 bits into ten letters that is manageable to describe, since a direct coding is very long and complicated.
There are still some possible arrangements of three letters left over, although the number is now rather small. 10 bits can still be encoded in those which remain, as follows:
00iiibbbbo (1,3,4: 6 ways: 2 remaining for use here) 01aaaajjjo 100mjjjkkk (1,3,3: 3 ways) 101iiinkkk 110iiijjjo 1110mncccc (1,1,4: 3 ways: 2 used here) 1111mbbbbo
And 7 bits can still be encoded in the few now remaining:
0aaaano (1,1,4: 3 ways: 1 remaining for use here) 10mnkkk (1,1,3: 3 ways: 2 used here) 11mjjjo
The remaining combinations make 5:
iiino (1,1,3: 3 ways: 1 remaining for use here)
and 3 bits:
mno (1,1,1: 1 way)
To encode 47 bits as ten letters, we now proceed as follows, using the codings that we've constructed so far.
aaaa, iii, or m represent the first of the 10 letters.
The remaining letters are divided into three groups of three, coded using the method above.
Note that the codings already constructed produce a different number of binary bits, either 14, 10, 7, 5, or 3 bits, from every possible sequence of three letters, and no sequence of three letters is used twice. Similarly, for the first of the ten letters, either 4 bits, 3 bits, or 1 bit is produced from that letter, and none of the 26 letters are used twice; 16 of them produce a 4 bit sequence, 8 of them produce a 3 bit sequence, and 2 of them produce a 1 bit sequence. Thus, given a sequence of 10 letters, it is possible, using those codings, not only to decode them uniquely to a string of from 46 to 10 bits, but also to have additional information those bits do not contain: the exact breakdown of which codings were used. Appending prefix bits indicating the coding to use for the different combinations of codings is what allows all combinations of 47 bits to be encoded.
In one important respect, the coding described here is the reverse of a Chen-Ho encoding; in Chen-Ho encoding, all combinations of 3 digits are represented as 1,000 out of the 1,024 possible combinations of 10 bits. Here, not all possible combinations of ten letters are represented in 47 bits; instead, it is all possible combinations of 47 bits which are represented as 140,737,488,355,328 out of the 141,167,095,653,376 possible combinations of 10 letters, but the code is still described as a binary coding of letters even though it is used as a coding of bits in letters and not as a coding of letters in bits.
For the first group of three, the 14, 10, 7, and 5 bit codings are represented by AAAAAAAAAA, FFFFFF, PPP, and X respectively; for the second, BBBBBBBBBB, GGGGGG, QQQ, and Y; for the third, CCCCCCCCCC, HHHHHH, RRR, and Z. The 3 bit coding is not required. Note that the length of the strings is equal to the length of the coding in bits minus 4, so that the length of the symbols used here are constant.
The table below has been revised from its original form. I would like to thank Mr. Ross Presser for bringing to my attention the omission of a portion of the possible space of binary inputs. This has been corrected, and an inconsistency in the ordering of the alphabetic codings used has also been removed. |
47 bits become 10 letters as follows:
0aaaaAAAAAAAAAABBBBBBBBBBCCCCCCCCCC (14,14,14, 1 way) 10iiiAAAAAAAAAABBBBBBBBBBCCCCCCCCCC (14,14,14, 1 way) 1100mAAAAAAAAAABBBBBBBBBBCCCCCCCCCC (14,14,14, 1 way) 11010aaaaFFFFFFBBBBBBBBBBCCCCCCCCCC (10,14,14, 3 ways) 11011aaaaAAAAAAAAAAGGGGGGCCCCCCCCCC 11100aaaaAAAAAAAAAABBBBBBBBBBHHHHHH 111010iiiFFFFFFBBBBBBBBBBCCCCCCCCCC (10,14,14, 3 ways) 111011iiiAAAAAAAAAAGGGGGGCCCCCCCCCC 111100iiiAAAAAAAAAABBBBBBBBBBHHHHHH 11110100mFFFFFFBBBBBBBBBBCCCCCCCCCC (10,14,14, 3 ways) 11110101mAAAAAAAAAAGGGGGGCCCCCCCCCC 11110110mAAAAAAAAAABBBBBBBBBBHHHHHH 11110111aaaaPPPBBBBBBBBBBCCCCCCCCCC (7,14,14, 3 ways) 11111000aaaaAAAAAAAAAAQQQCCCCCCCCCC 11111001aaaaAAAAAAAAAABBBBBBBBBBRRR 111110100iiiPPPBBBBBBBBBBCCCCCCCCCC (7,14,14, 3 ways) 111110101iiiAAAAAAAAAAQQQCCCCCCCCCC 111110110iiiAAAAAAAAAABBBBBBBBBBRRR 111110111aaaaFFFFFFGGGGGGCCCCCCCCCC (10,10,14, 3 ways) 111111000aaaaFFFFFFBBBBBBBBBBHHHHHH 111111001aaaaAAAAAAAAAAGGGGGGHHHHHH 1111110100iiiFFFFFFGGGGGGCCCCCCCCCC (10,10,14, 3 ways) 1111110101iiiFFFFFFBBBBBBBBBBHHHHHH 1111110110iiiAAAAAAAAAAGGGGGGHHHHHH 1111110111aaaaXBBBBBBBBBBCCCCCCCCCC (5,14,14, 3 ways) 1111111000aaaaAAAAAAAAAAYCCCCCCCCCC 1111111001aaaaAAAAAAAAAABBBBBBBBBBZ 11111110100mPPPBBBBBBBBBBCCCCCCCCCC (7,14,14, 3 ways) 11111110101mAAAAAAAAAAQQQCCCCCCCCCC 11111110110mAAAAAAAAAABBBBBBBBBBRRR 11111110111iiiXBBBBBBBBBBCCCCCCCCCC (5,14,14, 3 ways) 11111111000iiiAAAAAAAAAAYCCCCCCCCCC 11111111001iiiAAAAAAAAAABBBBBBBBBBZ 111111110100aaaaPPPGGGGGGCCCCCCCCCC (7,10,14, 6 ways) 111111110101aaaaPPPBBBBBBBBBBHHHHHH 111111110110aaaaFFFFFFQQQCCCCCCCCCC 111111110111aaaaAAAAAAAAAAQQQHHHHHH 111111111000aaaaFFFFFFBBBBBBBBBBRRR 111111111001aaaaAAAAAAAAAAGGGGGGRRR 1111111110100mXBBBBBBBBBBCCCCCCCCCC (5,14,14, 3 ways) 1111111110101mAAAAAAAAAAYCCCCCCCCCC 1111111110110mAAAAAAAAAABBBBBBBBBBZ 1111111110111iiiPPPGGGGGGCCCCCCCCCC (7,10,14, 6 ways) 1111111111000iiiPPPBBBBBBBBBBHHHHHH 1111111111001iiiFFFFFFQQQCCCCCCCCCC 1111111111010iiiAAAAAAAAAAQQQHHHHHH 1111111111011iiiFFFFFFBBBBBBBBBBRRR 1111111111100iiiAAAAAAAAAAGGGGGGRRR 1111111111101aaaaFFFFFFGGGGGGHHHHHH (10,10,10, 1 way) 11111111111100iiiFFFFFFGGGGGGHHHHHH (10,10,10, 1 way) 111111111111010mPPPGGGGGGCCCCCCCCCC (7,10,14, 6 ways) 111111111111011mPPPBBBBBBBBBBHHHHHH 111111111111100mFFFFFFQQQCCCCCCCCCC 111111111111101mAAAAAAAAAAQQQHHHHHH 111111111111110mFFFFFFBBBBBBBBBBRRR 111111111111111mAAAAAAAAAAGGGGGGRRR
Here is an example, to help illustrate the above.
The 47-bit string
11100011100011100011100011100011100011100011100
will code as follows:
11100 aaaa (14-bits/3 lt) (14-bits/3 lt) (10 bits ) 0111 00011100011100 01110001110001 1100011100 H 00aaaabbbbcccc 011aaaajjjcccc 110iiijjjo H B M I X B R WY
to produce HHBMI XBRWY as its representation.
Attempting to produce a direct coding, using the aaaa/iii/m symbols for individual letters, produces a very long list of codes. This nested approach is considerably more manageable.
The description of a direct coding would begin like this:
0000000aaaabbbbccccddddeeeeffffgggghhhhiiiijjjj 00000010iiibbbbccccddddeeeeffffgggghhhhiiiijjjj 00000011aaaajjjccccddddeeeeffffgggghhhhiiiijjjj
although, doubtless, different letters would be used, to avoid confusion between iiii and iii, although an alphabet of thirty letters would be required to achieve the effect that might be desired.
As noted, attempting to proceed directly to a coding of 47 bits to 10 letters would be complicated. Thus, the coding above used a coding of 14 bits to 3 letters as the starting point, with additional codings of fewer bits to 3 letters to allow all possible binary values for the original 47 bits to be covered.
There are 676 possible combinations of two letters, which is enough to cover all 512 possible values of nine binary bits. Could this be used as a starting point? Since ten is divisible by two, but not by three, there would be no need to encode one letter singly, as was done above; the regularity produced by this might offset the complexity produced by using a smaller basic unit.
Nine binary bits can be encoded in two letters as follows:
0aaaabbbb 10iiibbbb 11aaaajjj
Seven binary bits can be encoded in some of the remaining combinations like this:
0iiijjj 10mbbbb 11aaaan
Five binary bits can then be encoded as follows:
0mjjj 1iiin
and, finally, two binary bits can be encoded in the remaining possibility:
mn
We can represent the codings of nine binary bits to two letters as AAAAAAAA, BBBBBBBB, CCCCCCCC, DDDDDDDD and EEEEEEEE; the codings of seven binary bits to two letters as IIIIII, JJJJJJ, KKKKKK, LLLLLL and MMMMMM; the codings of five binary bits to two letters as PPPP, QQQQ, RRRR, SSSS and TTTT; and the codings of two binary bits to two letters as V, W, X, Y and Z.
An overall coding of 47 bits to 10 letters of this type might begin like this:
00AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDDEEEEEEEE (9,9,9,9,9) 0100IIIIIIBBBBBBBBCCCCCCCCDDDDDDDDEEEEEEEE (7,9,9,9,9, 5 ways) ... 1000AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDDMMMMMM 100100PPPPBBBBBBBBCCCCCCCCDDDDDDDDEEEEEEEE (5,9,9,9,9, 5 ways) ... 101000AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDDTTTT 101001IIIIIIJJJJJJCCCCCCCCDDDDDDDDEEEEEEEE (7,7,9,9,9, 10 ways) ... 101100IIIIIIBBBBBBBBCCCCCCCCDDDDDDDDMMMMMM 101101AAAAAAAAJJJJJJKKKKKKDDDDDDDDEEEEEEEE ... 101111AAAAAAAAJJJJJJCCCCCCCCDDDDDDDDMMMMMM 110000AAAAAAAABBBBBBBBKKKKKKLLLLLLEEEEEEEE 110001AAAAAAAABBBBBBBBKKKKKKDDDDDDDDMMMMMM 110010AAAAAAAABBBBBBBBCCCCCCCCLLLLLLMMMMMM 11001100PPPPJJJJJJCCCCCCCCDDDDDDDDEEEEEEEE (5,7,9,9,9, 20 ways) ... 11001111PPPPBBBBBBBBCCCCCCCCDDDDDDDDMMMMMM 11010000IIIIIIQQQQCCCCCCCCDDDDDDDDEEEEEEEE 11010001AAAAAAAAQQQQKKKKKKDDDDDDDDEEEEEEEE ... 11010011AAAAAAAAQQQQCCCCCCCCDDDDDDDDMMMMMM 11010100IIIIIIBBBBBBBBRRRRDDDDDDDDEEEEEEEE 11010101AAAAAAAAJJJJJJRRRRDDDDDDDDEEEEEEEE 11010110AAAAAAAABBBBBBBBRRRRLLLLLLEEEEEEEE 11010111AAAAAAAABBBBBBBBRRRRDDDDDDDDMMMMMM 11011000IIIIIIBBBBBBBBCCCCCCCCSSSSEEEEEEEE ... 11011010AAAAAAAABBBBBBBBKKKKKKSSSSEEEEEEEE 11011011AAAAAAAABBBBBBBBCCCCCCCCSSSSMMMMMM 11011100IIIIIIBBBBBBBBCCCCCCCCDDDDDDDDTTTT ... 11011111AAAAAAAABBBBBBBBCCCCCCCCLLLLLLTTTT 11100000IIIIIIJJJJJJKKKKKKDDDDDDDDEEEEEEEE (7,7,7,9,9, 10 ways) ... 11100010IIIIIIJJJJJJCCCCCCCCDDDDDDDDMMMMMM 11100011IIIIIIBBBBBBBBKKKKKKLLLLLLEEEEEEEE 11100100IIIIIIBBBBBBBBKKKKKKDDDDDDDDMMMMMM 11100101IIIIIIBBBBBBBBCCCCCCCCLLLLLLMMMMMM 11100110AAAAAAAAJJJJJJKKKKKKLLLLLLEEEEEEEE ... 11101001AAAAAAAABBBBBBBBKKKKKKLLLLLLMMMMMM 111010100VBBBBBBBBCCCCCCCCDDDDDDDDEEEEEEEE (2,9,9,9,9, 5 ways) ... 111011000AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDDZ 1110110010PPPPQQQQCCCCCCCCDDDDDDDDEEEEEEEE (5,5,9,9,9, 10 ways) ... 1110110101PPPPBBBBBBBBCCCCCCCCDDDDDDDDTTTT 1110110110AAAAAAAAQQQQRRRRDDDDDDDDEEEEEEEE ... 1110111000AAAAAAAAQQQQCCCCCCCCDDDDDDDDTTTT 1110111001AAAAAAAABBBBBBBBRRRRSSSSEEEEEEEE 1110111010AAAAAAAABBBBBBBBRRRRDDDDDDDDTTTT 1110111011AAAAAAAABBBBBBBBCCCCCCCCSSSSTTTT 1110111100PPPPJJJJJJKKKKKKDDDDDDDDEEEEEEEE (5,7,7,9,9, 30 ways) ... 1110111110PPPPJJJJJJCCCCCCCCDDDDDDDDMMMMMM 1110111111PPPPBBBBBBBBKKKKKKLLLLLLEEEEEEEE ... 1111000001PPPPBBBBBBBBCCCCCCCCLLLLLLMMMMMM 1111000010IIIIIIQQQQKKKKKKDDDDDDDDEEEEEEEE ... 1111000100IIIIIIQQQQCCCCCCCCDDDDDDDDMMMMMM 1111000101AAAAAAAAQQQQKKKKKKLLLLLLEEEEEEEE ... 1111000111AAAAAAAAQQQQCCCCCCCCLLLLLLMMMMMM 1111001000IIIIIIJJJJJJRRRRDDDDDDDDEEEEEEEE 1111001001IIIIIIBBBBBBBBRRRRLLLLLLEEEEEEEE 1111001010IIIIIIBBBBBBBBRRRRDDDDDDDDMMMMMM 1111001011AAAAAAAAJJJJJJRRRRLLLLLLEEEEEEEE 1111001100AAAAAAAAJJJJJJRRRRDDDDDDDDMMMMMM 1111001101AAAAAAAABBBBBBBBRRRRLLLLLLMMMMMM 1111001110IIIIIIJJJJJJCCCCCCCCSSSSEEEEEEEE 1111001111IIIIIIBBBBBBBBKKKKKKSSSSEEEEEEEE 1111010000IIIIIIBBBBBBBBCCCCCCCCSSSSMMMMMM 1111010001AAAAAAAAJJJJJJKKKKKKSSSSEEEEEEEE 1111010010AAAAAAAAJJJJJJCCCCCCCCSSSSMMMMMM 1111010011AAAAAAAABBBBBBBBKKKKKKSSSSMMMMMM 1111010100IIIIIIJJJJJJCCCCCCCCDDDDDDDDTTTT ... 1111010110IIIIIIBBBBBBBBCCCCCCCCLLLLLLTTTT 1111010111AAAAAAAAJJJJJJKKKKKKDDDDDDDDTTTT 1111011000AAAAAAAAJJJJJJCCCCCCCCLLLLLLTTTT 1111011001AAAAAAAABBBBBBBBKKKKKKLLLLLLTTTT 1111011010IIIIIIJJJJJJKKKKKKLLLLLLEEEEEEEE (7,7,7,7,9, 5 ways) ... 1111011110AAAAAAAAJJJJJJKKKKKKLLLLLLMMMMMM 11110111110VJJJJJJCCCCCCCCDDDDDDDDEEEEEEEE (2,7,9,9,9, 20 ways) ... 11111000001VBBBBBBBBCCCCCCCCDDDDDDDDMMMMMM 11111000010IIIIIIWCCCCCCCCDDDDDDDDEEEEEEEE 11111000011AAAAAAAAWKKKKKKDDDDDDDDEEEEEEEE ... 11111000101AAAAAAAAWCCCCCCCCDDDDDDDDMMMMMM 11111000110IIIIIIBBBBBBBBXDDDDDDDDEEEEEEEE 11111000111AAAAAAAAJJJJJJXDDDDDDDDEEEEEEEE 11111001000AAAAAAAABBBBBBBBXLLLLLLEEEEEEEE 11111001001AAAAAAAABBBBBBBBXDDDDDDDDMMMMMM 11111001010IIIIIIBBBBBBBBCCCCCCCCYEEEEEEEE ... 11111001100AAAAAAAABBBBBBBBKKKKKKYEEEEEEEE 11111001101AAAAAAAABBBBBBBBCCCCCCCCYMMMMMM 11111001110IIIIIIBBBBBBBBCCCCCCCCDDDDDDDDZ ... 11111010001AAAAAAAABBBBBBBBCCCCCCCCLLLLLLZ 111110100100PPPPQQQQKKKKKKDDDDDDDDEEEEEEEE (5,5,7,9,9, 30 ways) ... 111110100110PPPPQQQQCCCCCCCCDDDDDDDDMMMMMM 111110100111PPPPJJJJJJRRRRDDDDDDDDEEEEEEEE 111110101000PPPPBBBBBBBBRRRRLLLLLLEEEEEEEE 111110101001PPPPBBBBBBBBRRRRDDDDDDDDMMMMMM 111110101010PPPPJJJJJJCCCCCCCCSSSSEEEEEEEE 111110101011PPPPBBBBBBBBKKKKKKSSSSEEEEEEEE 111110101100PPPPBBBBBBBBCCCCCCCCSSSSMMMMMM 111110101101PPPPJJJJJJCCCCCCCCDDDDDDDDTTTT ... 111110101111PPPPBBBBBBBBCCCCCCCCLLLLLLTTTT 111110110000IIIIIIQQQQRRRRDDDDDDDDEEEEEEEE 111110110001AAAAAAAAQQQQRRRRLLLLLLEEEEEEEE 111110110010AAAAAAAAQQQQRRRRDDDDDDDDMMMMMM 111110110011IIIIIIQQQQCCCCCCCCSSSSEEEEEEEE 111110110100AAAAAAAAQQQQKKKKKKSSSSEEEEEEEE 111110110101AAAAAAAAQQQQCCCCCCCCSSSSMMMMMM 111110110110IIIIIIQQQQCCCCCCCCDDDDDDDDTTTT 111110110111AAAAAAAAQQQQKKKKKKDDDDDDDDTTTT 111110111000AAAAAAAAQQQQCCCCCCCCLLLLLLTTTT 111110111001IIIIIIBBBBBBBBRRRRSSSSEEEEEEEE 111110111010AAAAAAAAJJJJJJRRRRSSSSEEEEEEEE 111110111011AAAAAAAABBBBBBBBRRRRSSSSMMMMMM 111110111100IIIIIIBBBBBBBBRRRRDDDDDDDDTTTT 111110111101AAAAAAAAJJJJJJRRRRDDDDDDDDTTTT 111110111110AAAAAAAABBBBBBBBRRRRLLLLLLTTTT 111110111111IIIIIIBBBBBBBBCCCCCCCCSSSSTTTT ... 111111000001AAAAAAAABBBBBBBBKKKKKKSSSSTTTT 111111000010PPPPJJJJJJKKKKKKLLLLLLEEEEEEEE (5,7,7,7,9, 20 ways) ... 111111000101PPPPBBBBBBBBKKKKKKLLLLLLMMMMMM 111111000110IIIIIIQQQQKKKKKKLLLLLLEEEEEEEE ... 111111001000IIIIIIQQQQCCCCCCCCLLLLLLMMMMMM 111111001001AAAAAAAAQQQQKKKKKKLLLLLLMMMMMM 111111001010IIIIIIJJJJJJRRRRLLLLLLEEEEEEEE 111111001011IIIIIIJJJJJJRRRRDDDDDDDDMMMMMM 111111001100IIIIIIBBBBBBBBRRRRLLLLLLMMMMMM 111111001101AAAAAAAAJJJJJJRRRRLLLLLLMMMMMM 111111001110IIIIIIJJJJJJKKKKKKSSSSEEEEEEEE 111111001111IIIIIIJJJJJJCCCCCCCCSSSSMMMMMM ... 111111010001AAAAAAAAJJJJJJKKKKKKSSSSMMMMMM 111111010010IIIIIIJJJJJJKKKKKKDDDDDDDDTTTT ... 111111010110AAAAAAAAJJJJJJKKKKKKLLLLLLTTTT 111111010111IIIIIIJJJJJJKKKKKKLLLLLLMMMMMM (7,7,7,7,7) 1111110110000VQQQQCCCCCCCCDDDDDDDDEEEEEEEE (2,5,9,9,9, 20 ways) ... 1111110110011VBBBBBBBBCCCCCCCCDDDDDDDDTTTT
with the coding condensed by omitting intermediate stages in which the last group of two letters representing a smaller number of bits moves through the coding from left to right, or the first group of two letters representing a larger number of bits moves through the coding from right to left.
Such a coding seems, however, to be even more long and complicated than the first example.
Could a simplification still be derived, however, from replacing the single letter, and the first group of three letters, by two groups of two letters? This even suggests attempting to code a given number of bits directly into a group of five letters. If an additional level of hierarchy is introduced into the scheme, it may be possible to keep the list of possibilities at any one level of the hierarchy within limits.
Although the same capital letter may be used to represent one method of encoding bits into two letters, and one method of encoding bits into three letters, the same letters are used as the starting points of groups, so a possibility for two letters might be the same a possibility for three letters in the first position, but not in the second position.
The codings of two letters to 9, 7, 5, and 2 bits respectively, are represented by AAAAAAAA, IIIIII, PPPP, and V; the codings of three letters to 14, 10, 7, and 5 bits (the possibility of representing three bits in 3 letters can still safely be left as unused, because this now affects only two parts of the coding, and it was possible to leave it unused in three parts before) are represented by BBBBBBBBBB, GGGGGG, QQQ, and Y respectively.
Thus, 23 bits would be encoded into five letters by the scheme denoted as:
AAAAAAAABBBBBBBBBB
One can also represent 21 bits by the scheme:
IIIIIIBBBBBBBBBB
and then one can represent 20 bits by the scheme:
0PPPPBBBBBBBBBB 1AAAAAAAAGGGGGG
Eighteen bits are represented by:
0IIIIIIGGGGGG 10VBBBBBBBBBB 11AAAAAAAAQQQ
Sixteen bits are represented by:
0PPPPGGGGGG 10IIIIIIQQQ 11AAAAAAAAYand thirteen bits are represented by:
0PPPPQQQ 1IIIIIIY
Let us represent the codings of 23, 21, 20, 18, 16 and 13 bits to five letters by aaaaaaaaaaa and bbbbbbbbbbb, fffffffff and ggggggggg, iiiiiiii and jjjjjjjj, pppppp and qqqqqq, uuuu and vvvv, and x and y, respectively.
Then, one can represent 47 bits by two five-letter groups by the following coding:
0aaaaaaaaaaabbbbbbbbbbb 100fffffffffbbbbbbbbbbb 101aaaaaaaaaaaggggggggg 1100iiiiiiiibbbbbbbbbbb 1101aaaaaaaaaaajjjjjjjj 11100fffffffffggggggggg 111010ppppppbbbbbbbbbbb 111011aaaaaaaaaaaqqqqqq 111100iiiiiiiiggggggggg 111101fffffffffjjjjjjjj 1111100iiiiiiiijjjjjjjj 11111010uuuubbbbbbbbbbb 11111011aaaaaaaaaaavvvv 11111100ppppppggggggggg 11111101fffffffffqqqqqq 111111100ppppppjjjjjjjj 111111101iiiiiiiiqqqqqq 1111111100uuuuggggggggg 1111111101fffffffffvvvv 11111111100xbbbbbbbbbbb 11111111101aaaaaaaaaaay 11111111110uuuujjjjjjjj 11111111111iiiiiiiivvvv
and thus, some simplicity is gained by having an additional level of the hierarchy, and, as a bonus, it becomes possible to convert binary messages of some lengths to output having an odd number of five-letter groups.